For a proof of concept let's try and join some audio together and see what diarization pipeline will produce. I put some sample Iriss data in audio, let's read them and concatenate them together:
In [ ]:
import pydub
from pathlib import Path
audio_files = list(Path("audio").glob("*wav"))
print("Available audio files: ", audio_files)
Available audio files: [PosixPath('audio/Iriss-N-G5035-P600034-avd.wav'), PosixPath('audio/Iriss-N-G5036-P600034-avd.wav'), PosixPath('audio/Iriss-J-Gvecg-P500016-avd.wav')]
Let's try and read one of them:
In [ ]:
random_audio_segment = pydub.AudioSegment.from_file(audio_files[0])
random_audio_segment
Out[ ]:
Let's read all of them:
In [ ]:
audio_segments = [pydub.AudioSegment.from_file(i) for i in audio_files]
We can concatenate them using +, or sum()
In [ ]:
concatenated_audio = sum(audio_segments)
In [ ]:
concatenated_audio
Out[ ]: